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THE  APPROXIMATE  SOLUTION  OF  A  SIMPLE  CONSTRAINED  SEARCH  PATH 
MOVING  TARGET  PROBLEM  USING  MOVING  HORIZON  POLICIES 


Presented  here  are  the  results  of  applying  moving  horizon 
policies  to  solve  approximately  a  moving  target  problem,  where 
both  the  searcher  and  the  target  have  constraints  on  their  paths. 
The  solution  procedure  can  be  viewed  as  an  approximation  of  the 
optimal  dynamic  programming  method  of  Eagle  (1982) .   This  approx- 
imation may  be  useful  if  limits  on  available  computer  storage 
or  computer  time  do  not  allow  calculation  of  the  optimal  solution 

Only  one  problem  geometry  was  examined.   The  problem  was 
selected  to  keep  the  computer  computations  feasible  rather  than 
to  be  representative  of  any  real-world  search.   It  is  possible 
that  the  patterns  observed  in  the  solution  are  specific  to  this 
problem  geometry.   Further  work  is  required  to  establish  the  gen- 
erality (or  lack  thereof)  of  these  results. 
1.   The  Problem 

The  target  and  searcher  both  move  in  discrete  time  among  the 
9  cells  shown  in  Figure  1.   The  searcher  starts  in  cell  1,  and 
the  target  starts  in  cell  9.   In  each  time  period  the  searcher 
can  move  from  his  current  cell  to  any  adjacent  cell.   Cells  are 
adjacent  if  they  share  a  common  side.   The  searcher  can  also 
choose  to  remain  in  his  current  cell.   The  target  moves  from  cell 
to  cell  according  to  a  specified  Markov  transition  matrix.   The 
probability  of  the  target  remaining  in  any  cell  i,  given  it  was 
in  cell  i  in  the  previous  time  period,  is  .4.   The  probability 
that  the  target  transitions  to  any  cell  adjacent  to  i  is  . 6/c^, 
where  c.  is  the  number  of  cells  adjacent  to  i.   So  the  target 


transition  matrix  is 
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If  the  searcher  chooses  the  cell  occupied  by  the  target,  then  the 
target  is  detected  with  probability  .5.   If  the  searcher  chooses 
a  cell  not  occupied  by  the  target,  then  the  target  can  not  be 
detected  during  that  time  period.   The  searcher  has  T  time  periods 
in  which  to  search.   His  problem  is  to  select  that  T-time  period 
search  path  which  minimizes  the  probability  of  target  non-detec- 
tion (PND) . 
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Figure  1.   9-cell  search  grid. 


2.   Moving  Horizon  Policies 

The  problem  presented  was  solved  approximately  using  m-time 
period  moving  horizon  (m-TPMH)  policies.   Such  a  policy  is 
defined  as  follows:   When  T  time  periods  remain  in  which  to 
search  and  T  >  m,  the  m-TPMH  policy  selects  as  the  next  search 
cell  that  cell  which  would  be  optimal  if  m  time  periods  remained 
in  the  problem.   When  T  <  m,  the  optimal  search  path  is  selected. 
The  1-TPMH  policy  is  called  the  myopic  policy. 

Moving  horizon  policies  were  introduced  for  the  Markov  deci- 
sion process  by  Shapiro  (1969)  and  have  been  recently  suggested 
for  search  applications  by  Stewart  (1984)  . 

For  this  investigation,  dynamic  programming  was  used  to  con- 
struct the  (m+l)-TPMH  policy  from  the  m-TPMH  policy.   The  details 
are  in  Appendix  A  and  Eagle  (1982)  . 


3.   Experimental  Results 

A  total  of  320  cases  were  examined  using  problem  lengths  T 

(T=l/2/ ,40)  and  m-TPMH  policies  (m=l,2, . . . ,8) .   In  addition, 

the  optimal  solutions  were  obtained  (using  dynamic  programming 

and  total  enumerication)  for  T  from  1  to  15  time  periods.   Figures 

2  through  7  illustrate  some  observations  suggested  by  the  data 

collected. 

Observation  1:   For  the  moving  horizon  and  optimal  policies 

examined,  the  decrease  in  PND  with  increasing  T  was  "almost 

asymptotically  geometric." 

Figures  2  through  6  illustrate  "almost."   In  Figure  2,  PND 
is  plotted  on  a  logarithmic  scale  against  T.   It  appears  here 
that  PND  for  the  myopic  solution,  the  8-TPMH  solution,  and  the 
optimal  solution  are  very  nearly  asymptotically  geometrically 
decreasing.   It  is  also  apparent  that  the  8-TPMH  policy  generates 
a  PND  which  decreases  more  rapidly  than  that  generated  by  the 
myopic  policy.   Figures  3  and  4  show,  however,  that  there  is 
some  fine  structure  in  the  graphs  of  PND  which  is  not  apparent  in 
Figure  2.   In  Figure  3,  the  ratio  PND (T) /PND (T-l)  is  plotted  for 
the  myopic  and  8-TPMH  policies.   Figure  4  is  a  similar  plot  with 
an  expanded  y-axis  scale.   It  appears  that  while  the  myopic 
policy  is  asymptotically  geometric,  the  8-TPMH  policy  is  not. 
Graphs  of  PND (T) /PND (T-l)  for  the  other  moving  horizon  policies 
tested  show  an  "almost  asymptotically  geometric"  pattern  similar 
to  that  of  the  8-TPMH  policy.  (See  Figures  5  and  6.) 
Observation  2;   It  is  possible  for  an  m,-TPMH  policy  to  produce 
a  smaller  PND  than  a  m2-TPMH  policy  when  m,  <  m2. 
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In  general,  m-TPMH  policies  performed  better  as  m  increased 
from  1  to  8,  but  there  were  some  exceptions.   Figure  7  illustrates 
Here  the  difference  in  PND  produced  by  the  3-  and  4-TPMH  policies 
is  plotted  against  problem  length  T.   A  negative  value  of  this 
difference  indicates  that  the  3-TPMH  policy  performed  better  than 
the  4  TPMH  policy  for  that  particular  value  of  T.   For  example, 
for  T=l.l,  the  3-TPMH  policy  produced  a  PND  of  .4426,  while  the 
4-TPMH  policy  gave  .4434.   The  difference  of  -.0008  is  plotted 
in  Figure  7. 

Observation  3 ;   For  T  <  15,  the  optimal  and  8-TPMH  policies 
produced  identical  PND. 

This  is  not  to  suggest  that  the  8-TPMH  policy  is  optimal  (It 
is  not  optimal. -  The  6-TPMH  policy  produced  smaller  values  of 
PND  for  some  T.),  but  rather  that  it  may  be  a  good  approximately 
optimal  policy  for  this  problem. 
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4.   Looking  for  a  Lower  Bound  to  PND 

Moving  horizon  policies  provide  an  upper  bound  to  the  optimal 
PND.   It  would  be  useful  to  construct  a  lower  bound  as  well.   If 
for  all  T  greater  than  or  equal  to  some  T,  the  optimal  policy 
produced  a  non-decreasing  PND (T) /PND (T-l)  (as  does  the  myopic 
policy  in  this  example  for  T  =  3) ,  then 

a     \ (T-T) 
■  PND (T)    » 
PND(T)  >  PND(T) 


PND (T-l) 


for  all  T  >  T.   Unfortunately,  the  optimal  policy  in  this  example 
did  not  generate  non-decreasing  PND (T) /PND (T-l) .   (See  Figures 
4  and  6.)   The  strongest  statement  about  the  optimal  PND  that 
the  data  collected  can  support  is  apparently  the  following: 

For  all  T  €  (1,2,... ,15)  there  exists  a  maximum  y(T)  >  0 
satisfying 


PND(T) 
PND 


'-(Tll)  >  Y(T) 


That  is,  for  each  T,  there  was  some  maximum  positive  constant, 
y(T),  which  defined  the  tightest  geometrically  decreasing  lower 
bound  to  PND(T),  T  >  T. 

In  addition,  the  data  allow  the  following  additional  obser- 
vation concerning  the  moving  horizon  PND. 
Observation  4:   For  the  m-TPMH  policies  examined  with  T  >  10, 


PND(T)        PND (10) 


PND (T-l)   -   PND ( 9 ) 


That  is,  for  T  >  10,  the  1-time  period  geometric  decrease  in  the  mov 
horizon  PND(T)  was  bounded  below  by  PND (10) /PND (9) .   If  this 
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observation  also  holds  for  the  optimal  policy,  then  for  T    15 
we  have  for  the  optimal  policy, 

PND(T)  =  PNDU5)   PND(16j   PND(17>  pND(T) 

FNJU)    ™UUDj   PND(15)   PND(16)   *  *  "   PND(T-l) 


>  PND(i5)  /PND(10)\(T"15) 
-  ^U^-LD;  \PND(9)  / 


>  .3308   .9281  (T"15)  (1) 


If  (1)  is  a  lower  bound  for  this  problem,  it  is  a  fairly  tight 
one.   This  possible  lower  bound  is  plotted  in  Figure  2.   Figure  8 
shows  the  difference  between  this  possible  bound  and  the  PND  pro- 
duced by  the  8-TPMH,  2-TPMH  and  myopic  policies.   Figure  8  also 
suggests  that  increasing  m  from  1  to  2  resulted  in  considerably 
more  policy  improvement  than  did  increasing  m  from  2  to  8 . 
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Appendix  A:   The  Dynamic  Programming  Procedure  for  Determining 
Moving  Horizon  Policies 

We  make  the  following  definitions: 

C  =  set  of  all  cells  =  {1,2,...,N}  , 

C.  =  set  of  all  cells  accessible  in  1  time  period  to  a  searcher 
in  cell  j  , 

q.  =  P  {target  detection (target  in  cell  j  and  search  conducted 
in  cell  j }  , 

p. .  =  P  {target  transitions  in  1  time  period  from  cell  i  to 
cell  j }  , 

r    -i  NxN 

P  =  target  transition  matrix  =  LP- •]   €  R      r 

d   =  the  cell  searched  when  n  time  periods  remain  in  the 
n 

problem, 

6   =  (d  ,  d  _,,..., d,)  =  an  n-time  period  search  path, 

it.  =  probability  that  the  target  is  in  cell  j  , 

7T  =  (it,  ,  tt2/  .  .  •  ,  TT.J  =  target  probability  distribution  over  C. 

With  any  n-time  period  search  path,  6  ,  there  can  be  associ- 
ated a  vector  a  €  R   such  that  a.  =  P{ target  detection | 6   is 
followed;  target  in  cell  i  when  search  begins}.   The  probability 
of  detection  when  5   is  followed  and  the  initial  target  distri- 
bution is  it  is  then  ira.   Now  let  A(n,i)  be  the  set  of  vectors 
associated  with  all  possible  6  ,  given  the  searcher  is  in  cell  i 
when  n  time  periods  remain.   Then  the  maximum  obtainable  n-time 
period  probability  of  detection  given  an  initial  target  distri- 
bution of  77  is 

V  (Tr,i)  =  max  Tra   .  (Al) 

a  6  A(n,i) 
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And  the  optimal  n-time  period  search  path  is  that  6   associated 
with  the  maximizing  a  €  A(n,i). 

The  dynamic  programming  problem  is  then  to  construct  the 
vector  sets  A(n+l,l) ,  A(n+1, 2) , . . . ,A(n+l,N)  from  the  vector  sets 
A(n,l),  A(n,2) , . . . ,A(n,N) .   Also,  each  a  €  A(n+l,i)  must  have 
associated  with  it  an  (n+l)-time  period  search  path. 

Let  a  be  any  element  of  A(n,j)  and  6   be  the  n-time  period 
search  path  associated  with  a.   Now  the  N-vector  associated  with 
the  (n+l)-time  period  search  path  (j,6  )  is 

a  =  e .  q .  +  P .  a   , 

N  .  NxN 

where  e.  €  R  is  the  j-unit  vector  and  P.  €  R    is  P  with  row  i 
3  J  3 

multiplied  by  (1-q.).   To  see  this,  the  components  of  a  and  a 
are  interpreted  as  probabilities  of  detection  when  n+1  and  n 
searches  respectively  remain  in  the  problem.   The  entire  set 
A(n+l,i)  is  then 

{a  €  RN|a  =  e.  q.  +  P.  a  ;  j  i   C±  &  a  i   A(n,j)}.(A2) 

The  dynamic  programming  process  begins  by  setting 

N 
A(0,i)  =  0  i   R  ,    i  =  1,2, ...,N.   One  iteration  gives  the  myopic 

solution.   Specifically,  applying  (A2)  when  A(0,i)  =0  yields 

A(l,i)  =  ei  q±,  i  =  1, . . . ,N  , 

with  an  associated  1-time  period  search  path  of  5   =  d,  =  i. 
Continued  application  of  (A2)  allows  recursive  construction  of 
the  sets  A(n,i)  with  an  n-time  period  search  path  associated 
with  each  vector  in  each  set. 
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The  set  A(n,i)  constructed  in  this  manner  from  the  sets 
A(n-l,j),  j  6  C-,  may  contain  some  vectors  which  will  never  max- 
imize (Al)  for  any  target  distribution  ir.   The  6n  associated 
with  each  of  these  "dominated"  vectors  can  not  be  an  optimal 
n-time  period  search  path.   To  test  whether  a  vector  a  6  A(n,i) 
is  dominated,  the  following  linear  program  is  solved: 

min    x  -  ira 

TT,X 

s.  t.     x  >  Tra,  a  €  A  (a) 

TT6II 

where  A(a)  is  the  set  A(n,i)  less  the  vector  a,  and 

II  =  { 7T  €  R  |  it  .  >0  and  X  tt  .  =  1}.   Whenever  the  minimal  value 
l         i      i 

of  x  -  Tra  is  non-negative,  a  is  dominated  and  can  be  removed 
from  A(n,i) .   Only  the  non-dominated  vectors  in  A(n,i)  need  be 
used  to  construct  A(n+l,j).   Letting  B  be  the  convex  hull  of 
A (a) ,  Eagle  (1982)  showed  that  a  is  dominated  if  and  only  if 
there  exists  some  b  €  B  such  that  b  >  a. 

A  simpler  domination  procedure  is  to  remove  a  from  A(n,i) 
wherever  there  exists  a  vector  a  €  A (a)  such  that  a  >  a.   This 
method  is  easier  to  implement  than  the  linear  programming  pro- 
cedure, but  does  not  reduce  A(n,i)  to  its  minimum  size.   Thus 
more  computer  storage  is  required  to  save  A(n,i)  in  each  stage 
of  the  dynamic  program. 

Once  the  vector  sets  A(m,i),  i  =  1,...,N,  have  been  con- 
structed and  a  5   has  been  associated  with  each  a  €  A(m,i),  then 
the  m-TPMH  policy  is  available.   Assume  n  >  m  time  periods  remain 
in  the  problem,  the  searcher  is  in  cell  i,  and  the  target 
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distribution  is  tt.   Then  the  m-TPMH  policy  picks  as  d   the  first 
element  of  6  ,  where  6   is  the  m-time  period  search  path  asso- 
ciated with 


argmax  ira  .  (A3) 

a  6  A(ra,i) 


If  the  target  is  not  detected  in  time  period  n,  the  target  dis- 
tribution given  a  Bayesian  update  for  the  unsuccessful  search 
and  (A3)  is  used  again  to  determine  d  _, .   When  the  problem 
solution  progresses  to  the  point  where  m  time  periods  remain, 
the  m-TPMH  policy  picks  the  optimal  5   for  the  remaining  time 
periods. 
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